RESEARCH ARTICLES Gene Cluster Statistics with Gene Families

نویسندگان

  • Narayanan Raghupathy
  • Dannie Durand
چکیده

Identifying genomic regions that descended from a common ancestor is important for understanding the function and evolution of genomes. In distantly related genomes, clusters of homologous gene pairs are evidence of candidate homologous regions. Demonstrating the statistical significance of such ‘‘gene clusters’’ is an essential component of comparative genomic analyses. However, currently there are no practical statistical tests for gene clusters that model the influence of the number of homologs in each gene family on cluster significance. In this work, we demonstrate empirically that failure to incorporate gene family size in gene cluster statistics results in overestimation of significance, leading to incorrect conclusions. We further present novel analytical methods for estimating gene cluster significance that take gene family size into account. Our methods do not require complete genome data and are suitable for testing individual clusters found in local regions, such as contigs in an unfinished assembly. We consider pairs of regions drawn from the same genome (paralogous clusters), as well as regions drawn from two different genomes (orthologous clusters). Determining cluster significance under general models of gene family size is computationally intractable. By assuming that all gene families are of equal size, we obtain analytical expressions that allow fast approximation of cluster probabilities. We evaluate the accuracy of this approximation by comparing the resulting gene cluster probabilities with cluster probabilities obtained by simulating a realistic, power-law distributed model of gene family size, with parameters inferred from genomic data. Surprisingly, despite the simplicity of the underlying assumption, our method accurately approximates the true cluster probabilities. It slightly overestimates these probabilities, yielding a conservative test. We present additional simulation results indicating the best choice of parameter values for data analysis in genomes of various sizes and illustrate the utility of our methods by applying them to gene clusters recently reported in the literature. Mathematica code to compute cluster probabilities using our methods is available as supplementary material.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Screening of DFNB3 in Iranian families with autosomal recessive non-syndromic hearing loss reveals a novel pathogenic mutation in the MyTh4 domain of the MYO15A gene in a linked family

Objective(s): Non-syndromic sensorineural hearing loss (NSHL) is a common disorder affecting approximately 1 in 500 newborns. This type of hearing loss is extremely heterogeneous and includes over 100 loci. Mutations in the GJB2 gene have been implicated in about half of autosomal recessive NSHL (ARNSHL) cases, making this the most common cause of ARNSHL. For the latter form of deafness, most f...

متن کامل

Mutation Analysis of Connexin 26 Gene and Del (GJB6-D13S1830) in Patients with Hereditary Deafness from Two Provinces in Iran

Mutations in the connexin 26 (Cx26) gene at the DFNB1 locus on chromosome 13q12 are associated with autosomal recessive non-syndromic hearing loss (ARNSHL). There are many known mutations in this gene that cause hearing loss. A single frameshift, at position 35 (35delG) accounts for 50% of mutations in the Caucasian population with carrier frequencies of 1.5-2.5%. In this study we investigated ...

متن کامل

Mutation analysis of connexin 50 gene among Iranian families with autosomal dominant cataracts

Objective(s): Childhood cataract is a genetically heterogeneous eye disorder that results in visual impairment. The aim of this study was to identify the genetic mutations of connexin 50 gene among Iranian families suffered from autosomal dominant congenital cataracts (ADCC). Materials and Methods: Families, having at least two members with bilateral familial congenital cataract, were selected ...

متن کامل

A Novel Missense Mutation in the ALDH13 Gene Causes Anophthalmia in Two Unrelated Iranian Consanguineous Families

Anophthalmia or microphthalmia (A/M) is a rare group of congenital/developmental ocular malformations, characterized by absent or small eye within the orbit affecting one or both eyes. It  has complex etiology with chromosomal, monogenic with high heterogeneity, and environmental causes. We performed genome SNP-array analysis followed by autozygosity mapping and sequencing in the members o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009